The goal of this analysis is to determine if the correlations between different trophic guilds differ between protected and protected sites. Specifically, we are modeling the observational data 5-years post MPA implementation.

Traditional SEM allow for categorical*continuous interactions. However, piecewise SEM’s cannot yet accomodate a global interaction (e.g. MPA_status x everything else). Therefore, I am following Douma and Shipley (2021, Ecosphere). They present a reproducible procedure through which to test for global interactions by imposing sequential contraints on a system of linear regressions.

This documents represents a summary of lots of previous modeling efforts and therefore reflects what is in my opinion the best path forward in terms of the publication.

The overall hypothesis

In general, we are interested in testing if protection staus alters the abundance of each trophic level AND the relationship between each guild and the guild that it consumes. In other words, we for example want to understand not only if purple urchin abundance is lower in MPAs, but if the relationship between lobster and purple urchins is more negative in MPAs (e.g. evidence that lobster are exerting more pressure on purple urchin population inside the MPAs). Here is a path diagram, known as a directed acrylic diagram (DAG) in the parlance of SEMs.

Figure 1. Hypothesized directed acrylic diagram for the interactions between trophic levels. Black lines represents changes in species abundance due to protection (e.g. an “intercept” effect”). Blue and red lines represent path coefficients that are allowed to vary by protection status. This DAG represents the core hypothesized causal diagram that we want to test.

We can compare Figure 1 to an alternative DAG in which the path coefficients between species are not allowed to vary (Figure 2). At its core the bulk of this analysis tests which of these DAGs is a better representation of the data.

Figure 2. Hypothesized directed acrylic diagram for the interactions between trophic levels. Black lines represents changes in species abundance due to protection status, or the effect of one species on another.

It is important to note, that for the SEM’s we need each individual linear model to be modeling the same subset of the data. Therefore, we have to a priori limit the data set to only observations that are present across all predictors, responses, and grouping variables. In other words, I had to drop all rows in the data set with NA. This caused certain survey’s to be entirely excluded, like the LTER lobster surveys as those surveys do not also include data on purple urchin, sheephead, etc.

The key aspect of this analysis is that we are testing specific DAGs that represent hypothesized causal topologies. And we can only test the hypothesized causal topologies against a more-saturated model. In other words the hypothesized model must be nested within a more saturated model. Shipley and Douma (2020) provide an excellent overview for how to develop saturated models with free parameters against which to test hypothesized DAGs.

Here I will start by writing out the fixed effects model structures for our hypothesized DAGs:

DAG 1:

This hypothesized causal topology predicts that all endogenous variables can express a different intercept based on MPA status. It also predicts that there will be differences in the path coefficients based on status for all higher trophic levels on kelp, and lobster / sheephead impacts on urchin. Figure 1 depicts DAG 1.

$$ \[\begin{align} kelp &\sim \beta_0 + \beta_1Status + \beta_2Purple + \beta_3PurpleStatus + \beta_4Red + \beta_5RedStatus+\beta_6Lob+\beta_7LobStatus + \beta_8Sheep+\beta_9SheepStatus \\ purple &\sim \beta_0 + \beta_1Status + \beta_6Lob+\beta_7LobStatus + \beta_8Sheep+\beta_9SheepStatus \\ red &\sim \beta_0 + \beta_1Status + \beta_6Lob+\beta_7LobStatus + \beta_8Sheep+\beta_9SheepStatus \\ lob &\sim \beta_0 + \beta_1Status\\ sheep &\sim \beta_0 + \beta_1Status\\ \end{align}\] $$

DAG 2:

An alternative hypothesis could be that only the intercepts vary by status and the coefficients for differences in path coefficients are fixed (at zero). However in this DAG we still assume the same directed paths between endogenous variables. Figure 2 depicts DAG 2.

$$ \[\begin{align} kelp &\sim \beta_0 + \beta_1Status + \beta_2Purple + 0PurpleStatus + \beta_4Red + 0RedStatus+\beta_6Lob+0LobStatus + \beta_8Sheep+0SheepStatus \\ purple &\sim \beta_0 + \beta_1Status + \beta_6Lob+0LobStatus + \beta_8Sheep+0SheepStatus \\ red &\sim \beta_0 + \beta_1Status + \beta_6Lob+0LobStatus + \beta_8Sheep+0SheepStatus \\ lob &\sim \beta_0 + \beta_1Status\\ sheep &\sim \beta_0 + \beta_1Status\\ \end{align}\] $$ In this sense it’s easy to see that DAG 2 is nested withing DAG 1. Thus we can use the Shipley and Douma chi-square test to evaluate model fit.

DAG 3:

We might also believe that there is not direct effect of predators on kelp. Rather the effects of predators are conditional on kelp via their effects on urchins. Here we can change the DAG to reflect that hypothesis. See Figure 3.

Figure 3. DAG 3

$$ \[\begin{align} kelp &\sim \beta_0 + \beta_1Status + \beta_2Purple + \beta_3PurpleStatus + \beta_4Red + \beta_5RedStatus+0Lob+0LobStatus + 0Sheep+0SheepStatus \\ purple &\sim \beta_0 + \beta_1Status + \beta_6Lob+\beta_7LobStatus + \beta_8Sheep+\beta_9SheepStatus \\ red &\sim \beta_0 + \beta_1Status + \beta_6Lob+\beta_7LobStatus + \beta_8Sheep+\beta_9SheepStatus \\ lob &\sim \beta_0 + \beta_1Status\\ sheep &\sim \beta_0 + \beta_1Status\\ \end{align}\] $$

Again, here I’ve used zeros to FIX the parameters so that we can see that this DAG is nested within DAG 1.

You could imagine that there are many different causal topologies that we could test that are nested within DAG 1. We can and should explore more of these causal hypothesis. However, for times sake I’ll only test a few that I think are most relevant.

DAG 4:

We could also create simplier DAG’s where the effects of predators/consumers are more specific. For instance, here we assume that the effect of purple urchins on kelp differs by status. And that that effect is due to differences in lobster impacts on purple urchins by status (Figure 4).

Figure 4. DAG 4

$$ \[\begin{align} kelp &\sim \beta_0 + \beta_1Status + \beta_2Purple + \beta_3PurpleStatus + \beta_4Red + 0RedStatus+0Lob+0LobStatus + 0Sheep+0SheepStatus \\ purple &\sim \beta_0 + \beta_1Status + \beta_6Lob+\beta_7LobStatus + \beta_8Sheep+0SheepStatus \\ red &\sim \beta_0 + \beta_1Status + \beta_6Lob+0LobStatus + \beta_8Sheep+0SheepStatus \\ lob &\sim \beta_0 + \beta_1Status\\ sheep &\sim \beta_0 + \beta_1Status\\ \end{align}\] $$

Dag 5:

In this DAG protection status only has direct causal links with sheephead, lobster, and red urchin. Each of these species is commercially and recreationally harvested. Thus, we would predict that MPAs would increase the abundance (e.g. the intercept) of each of these guilds. Increases in the abundance (intercept) of predators could result in changes in the relationship between the predators and the urchin guilds. Thus this DAG should allows for differences in the path coefficients by MPA status between predators and prey.

Figure 5. DAG 5

$$ \[\begin{align} kelp &\sim 0+ 0Status + \beta_2Purple + 0PurpleStatus + \beta_3Red + 0RedStatus+0Lob+0LobStatus + 0Sheep+0SheepStatus \\ purple &\sim \beta_0 + \beta_1Status + \beta_6Lob+\beta_7LobStatus + \beta_8Sheep+\beta_9SheepStatus \\ red &\sim \beta_0 + \beta_1Status + \beta_6Lob+\beta_7LobStatus + \beta_8Sheep+\beta_9SheepStatus \\ lob &\sim \beta_0 + \beta_1Status\\ sheep &\sim \beta_0 + \beta_1Status\\ \end{align}\] $$

DAG 6:

Finally, as somewhat of a “null” model, we can test the hypothesis that all trophic levels are cuasally unrelated. And the only driver is MPA status.

Figure 6. DAG 6

$$ \[\begin{align} kelp &\sim \beta_0 + \beta_1Status + 0Purple + 0PurpleStatus + 0Red + 0RedStatus+0Lob+0LobStatus + 0Sheep+0SheepStatus \\ purple &\sim \beta_0 + \beta_1Status + 0Lob+0LobStatus + 0Sheep+0SheepStatus \\ red &\sim \beta_0 + \beta_1Status + 0Lob+0LobStatus + 0Sheep+0SheepStatus \\ lob &\sim \beta_0 + \beta_1Status\\ sheep &\sim \beta_0 + \beta_1Status\\ \end{align}\] $$

We should note that each model is nested within DAG 1. Subsequent models are NOT (I don’t think???) however necessarily nested within the previous model. For instance, DAG 3 is not nested within DAG 2 BUT DAG 5 is nested within DAG 3.

Fit the models and compare

Now that we have each of the hypothesized DAGs we can run each “model” (actually a system of 5 mixed effects models). For instance here is the code for DAG 1:

library(lme4)
## Warning: package 'lme4' was built under R version 4.3.1
## Loading required package: Matrix
## Warning: package 'Matrix' was built under R version 4.3.1
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
library(lmerTest)
## 
## Attaching package: 'lmerTest'
## The following object is masked from 'package:lme4':
## 
##     lmer
## The following object is masked from 'package:stats':
## 
##     step
# DAG 1

# Model 1: path coefficients vary between mpa and fished
dag1.1 <- lmer(kelp ~ status*(purple + red + lob + sheephead) + (1|site) + (1|year) + (1|source), data = df1, REML = F)
dag1.2 <- lmer(red ~ status*(lob + sheephead) + (1|site) + (1|year) + (1|source), data = df1, REML = F)
dag1.3 <- lmer(purple ~ status*(lob + sheephead) + (1|site) + (1|source), data = df1, REML = F)
dag1.4 <- lmer(sheephead ~ status + (1|site) + (1|year) + (1|source), data = df1, REML = F)
dag1.5 <- lmer(lob ~ status + (1|site) + (1|year) + (1|source), data = df1, REML = F)

dag1 <- list(dag1.1, dag1.2, dag1.3, dag1.4, dag1.5)

We can use the Douma and Shipley chi-square statistic to compare each model (Table 1). Interpreting the p-value associated with teh chi-square statistic is different than in traditional regression modeling. A p-value > 0.05 would suggest that a nested model is a better representation of the data than the more saturated model. Therefore, we find that we cannot assume that the saturated model (DAG 1) is not a better representation of the data. In other words the DAG 1 with varying path coefficients is the selected model.

Table 1. Key statistics associated with each DAG. Each model represents a DAG that includes 5 mixed-effects models depicting the different hypothesized paths. Models 2:6 are nested within Model 1. Therefore each is statistically compared to model 1 using the chi-square statistic.
Model Model description Chi-square value p-value AIC
1 Path coefficients vary NA NA -387.28
2 Path coefficients are fixed 22.76 0.00 -380.53
3 Modification 3 12.78 0.01 -382.51
4 Modification 4 18.59 0.00 -378.70
5 Modification 5 37.81 0.00 -363.48
6 Only status has causal impact 128.60 0.00 -290.68

Explore the selected model

So collecting coefficients from the selected DAG can be a bit tedius because you need to decipher each individual model component. Figure 7 (below) represents the coefficient estimates and significance levels of the various paths.

Figure 7. Selected model with coefficients.

Few important points about this figure. These ARE not standardized coefficients. I can derive those for the final figure if needed. I’ve only included coefficients for paths which were significant (\(\alpha < 0.05\)). In two instances, I included coefficients with an asterisk if \(0.05 < \alpha < 0.06\). Paths are included in the model but that are NOT significant are dashed. Significance tests for path coefficients represent a test of if the coefficient for a path differs from zero (versus if the coefficient of the protected vs. fished path differs). Such post-hoc tests can be derived from the summary tables of the individual models. Finally, the direct paths of protection status on each species are solid if there was a significant difference in species abundance by protection status. Coefficients on these paths represent deviations from the MPA state. Thus negative values suggest less in the fished sites (more in the protected sites).

Partial regression plots

Few caveats

Obviously, assuming gaussian distributed error is problematic for 0-1 constrained data. I’ve rerun all the analyses with more complex models using a beta distribution (in glmmTMB) with the same fixed and random effects structures. In general the results are the same. We still settle on the same model. And exploration of the coefficients reveals the same patterns (qualitatively, the coefficient estimates differ because they are on the scale of the link function which is a logit for the beta family). I can work on deriving the standardized coefficient estimates and scaling the size of the arrows by the coefficient value if that is helpful. Another option is we can make the plot a two panel with one panel for only MPA specific coefficients and the other for reference sites.

Also I think we should chat because some of the “main” effects in this analysis (e.g. differences in species abundances by status) seem to conflict with what you have in the paper currently. I don’t really understand why… But I looked and it seems you are using a package for mixed-effects models that I haven’t heard of. Could be that they are treating the analysis different???

Also I’ve been back and forth w/ Bob Douma on the interpretation of the p-values and how to run these multigroup analyses. He recommends starting from the fully saturated model. E.g. a model with every possible path. However, we don’t have an apriori hypothesis for if purple should impact red or it red should impact purple. Unfortunately, I can’t figure out how to incorporate those types of correlated errors in the model. Therefore, we shouldn’t place supreme confidence in this model.

Scrap